DOCUMENT RESUME 

ED 307 884 IR 052 772 



AUTHOR 
TITLE 
PUB DATE 
NOTE 



PUB TYPE 



Liddy, Elizabeth D. 

Discourse-Level structure in Abstracts. 
Oct 87 

lip.; Paper presented at the Annual Meeting of the 
American Society for Information Science (Boston, MA, 
October 4-8, 1987) . 

Reports - Research/Technical (143) — 
Speeches/Conference Papers (150) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



KF01/PC01 Plus Postage. 

^Abstracts; *Componential Analysis; ^Discourse 
Analysis; Information Retrieval; Max. rices; *Syntax; 
*Text Structure; Users (Information) 
*ERIC; *PyschInfo 



ABSTRACT 

An investigation was undertaken into the possibility 
of automatically detecting how concepts exist in relation to each 
other in abstracts, a rext-type commonly used in free-text retrieval. 
The end goal of this research is to capture these relationships in 
structured representations of abstracts' contents so that users can 
require not only that the concepts of interest to them co-occur in 
th<=> retrieved documents, but also that the roles they play in 
relation to one another are the ones of interest. Four tasks found 
useful in revealing other schema were performed by expert 
abstractors. The results were analyzed and used as the basis for 
developing a frame-like structure of abstracts reporting on empirical 
work. A discourse linguistic analysis of a sample of 276 abstracts 
identified the lexical/syntactic clues which could be used by a 
system to automatically instantiate the frame-like structure of 
individual abstracts. The text is supplemented by four tables and 
three figures. (10 references) (Author) 



********************************************************************* 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 



U i DEPARTMENT OF EDUCATION 

Ott<e A Educat'Ona* Resea'Ch improvement 

EOUCATIONAL RESOURCES INFORMATION 
CENTER FRlCi 

This document r-as oeen »eo'oduced «is 
*ec«i%ed t'om the ce'so" o f orgarufa'io" 
Originating >t 
C MinOf changes *a.e been made to mprcwe 
reproduction Quality 

• Po.nts ot ^ie* or opinions stated >n fi«s docu 
f»e'* * r "^^^san'y 'epresent o^-oa 1 

OERl PObitK)n or pol Cy 



DISCOURSE-LEVEL STRUCTURE IN ABSTRACTS 
Elizabeth D. Llddy 
iyracuse University, School of Information Studies, Syracuse, 



00 
00 

*> 
o 

CO 



ERIC 



Abstract. 
taken into 



An lnvsct. Ration was under- 
the possibility of automati- 
cally detecting how concepts exist in 
relationship to each other in abstracts* 
a text-type commonly used in free-text 
retrieval. The end goal of thie 
reeearch is to capture these relation- 
ships in structured representations of 
abetracts* contents so that ueers can 
require not only that the concepts of 
interest to them co-occur in the 
retrieved documents* but aleo that the 
roles they play in relation to each oth- 
er are the ones of interest. Four tasks 
found useful in revealing other schema 
uere performed by expert abetractore. 
The results uere analyzed and used as 
the basis of developing a frams-like 
structure of abetracte reporting on 
empirical work. A diecourse linguistic 
analysis of a sample of 276 abstracts 
identified the lexical/syntactic cluee 
which could be ueed by a system to auto- 
matically instantiate the frame-like 
etructure of individual abetracte. 



OVERVIEW 

While free-text eearching hae improved 
to eome extent an information syetem f s 
ability to retrieve on 1 v thoee documents 
of intereet to a ueer. it etill doee not 
produce results sufficiently refined for 
thoss users uho can specify quite pre- 
cieely what the content of relevant doc- 
uments should coneiet of. This ie 
because current free-text retrieval per- 
mite ueere to require only that concepte 
of intereet to them co-occur in a docu- 
ment. Ae a reeult. many nonrelevant 
docuraente are retrieved- because the 
search mechaniem cannot require the con- 
cepts to be in the relationship needed 
by the user CI 3. And although there are 
search techniques uhich require the 
deeired concepts to be in some particu- 
lar linear order or adjacency dietance 
within the abstract, there are none that 
require the deeired concepte to be in 
epeci f led eeman tic relati oneh ipe. 

In an attempt to improve on thie situ- 
ation, an mveet igat ion was undertaken 
into the poesibility of automatically 
detecting hou concepte exiet in rsla" 
tionehip to each other in empirical 
abetracte* a text-type commonly used in 
free-text retrieval. The goal of thie 
reeearch is to capture theee relation- 
ships in atructured repreeentatione 
abstracts* contents eo 



that ueere 



of 

can 



PERMISSION TO REPRODUCE TH 
MATERIAL HAS BEEN GRANTED E 

Elizabeth D. Lidd v 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)" 



rtquest not only that concepts rt inter- 
est occur in the retrieved documente* 
but also that theee concepts exist m 
the desired semantic relationships. 



BACKGROUND 

The belief that a structure exists in 
abstracts arises from work done in dis- 
course linguistics. which is concerned 
with the study of units of language 
larger than a sentence. These larger 
units are rafarred to ae texte. and have 
been the focus of increasing study in 
linguistics, artificial intelligence and 
natural language proceseing. One line 
of mveetigation in diecouree linguis- 
tics has been the detection of a partic- 
ularized structure within a given text 
type. Text typee found to exhibit char- 
acteristic syntactic and semantic organ- 
ization with predictable coneietency 
within that type include folk talee C23» 
narratives C31. and scholarly papere 
C4 3. The reeearch being reported here 
has extended thie line of l nveet igat l on 
and del ineating the 
text-type of empi rica 1 



by diecovering 
structure of the 
abetrac te. 



The theoretical baeie of thie work 
derivee partially from reeearch done in 
cognitive ecience showing that human 
underetanding requiree efficient 

echsiss for the organization of knowl- 
edge. One of the most widely accepted 
knowledge organizing theoriee 
sky»s frame structure theory 
frame is a 1 earned 



ie Oin- 
CS). A 
data-etructure origi- 



nally propoeed ae a formal iem for 
explaining human vieion and later ueed 
for deecribing human memory. The frame 
formal iem hae been ueeful in reeearch in 
human text underetanding and hae been 
eucceeefully extended for uee in a 
variety of computerized text underetand- 
ing eyeteme (eee C 6 3 for exampiee). 

The current etudy euggeete that in the 
eame way that a frame eerves ae a for- 
mal iem for repreeenting text type etruc- 
turee in memory. a frame etiucture can 
be detected in the text iteelf. In addi- 
tion, the investigation uae concerned 
with ehowing that the epecific lexical 
clues which indicate to humane how to 
instantiate their mental frame of a par- 
ticular text type are rule-governed 
enough to permit automatic instantiation 
of a frame etructure 
emp l rical abet rac te. 



for individual 



1 Q 



BESl COPY AVAILABLE 



A structure coneiste of components and 
the relatione among them. In text struc- 
ture* the components are thoee necessary 
categories of text content uhich define 
the text type. Relations are properties 
that hold betueen tuo or more entities 
and define the type of interaction, 
influence or simply co-occurrence that 
holds betueen the entitiee. 



ERLC 



METHODOLOGY 

The queetion of whether there le a pre- 
dictable, framelike structure in 
abetracts reporting on empirical work* 
use investigated by tapping the exper- 
tiee of profeeeional abetractors to 
delineate the componente and relations 
which compnee the abetract frame etruc- 
ture. ThiB wae done by meane of four 
taekB employing methodology similar to 
that used in cognitive peychology 
research to uncover vanoue schemata C7. 
8. 93. 

Taek !• a free-generation task. wae 
adminietered by mail to 14 profeeeional 
abetractors from either ERIC or PsycIN- 
FO. These abstractors were simply aeked 
to list all the componente of informa- 
tion that are included in an abetract of 
an empirical study. For the remaining 
taeks. each subject ueed the complete 
list of components generated by all the 
abetractors from their reepective eer- 
v ice. 

Tasks 2. 3 and A were adminieterod in 
person at the facility of each abstrac- 
tor. The taeke were adminietered in 
small groups of tuo to four subjects and 
the three tasks took a total of about 1 
and 1/2 to 2 hours of a subject's time. 

Task 2 asked the subjects to firet indi- 
cate which of the components in the list 
wars, to thair way of thinking, the most 
typical of an empirical abstract. They 
usre to then go back through the Hat 
and mark the componente they considered 
to be of the next level of prototypical- 
ity. This procees wae to be continued 
as long as the eubjects felt there were 
differences in degree of typicality. 

In Task 3. each subject was given a pack 
of cards. each card containing the name 
of a component from the list used in 
Task 2. plua written instructions for a 
multiple sorting procedure. A multiple 
sorting procedure simply a«ke subjecte 
to assign elements to categories jf 
their own choosing C103. The val' * of 
the procedure is that no preconceived 
limitationa are set on how the subject 
ie to perform the sort. The method ie 
ideal for this research, since it alloue 
the subject to impoee whatever structure 
they a-si r= cr, the components. 

Subjects were asked to spread the cards 
out and then sort thea into groups in 



such a way that all the carde in each 
group had something in common. Subjects 
were allowed to perform ae many differ- 
ent eorts as they wanted . 

Finally. Taek 4 eerved to identify the 
semantic relatione comprising the frame 
structure of empirical abetracts. Sub- 
jecte were l net rue ted to draw 1 inee from 
one component to the other componente 
with which, in their opinion, there wae 
a relationehip and to write on the con- 
necting line some word or worde to 
describe that relationship. 



RESULTS 

The componente freely generated in Task 
1 were normalized eo that eynonymoue 
uaye of referring to the same component 
were reduced to a canonical term or 
phrase. Abetractors from PsyclNFO gen- 
erated 24 components and the abstractors 
from ERIC generated 35 componente. with 
15 of theee components common to both 
groups of abetractore. Table 1 containe 
all the components generated with the 
number of abetractore who suggeeted each 
component. 

Of the ten ERIC abstractors who partici- 
pated in Taek 1. only eight were avail- 
able to participate in Taeke 2-4. while 
all four abetractors from PeyclNFO par- 
ticipated. The reeulte from theee 
abetractore on Task 2 produce the ranked 
ordering of componente of an empirical 
abetract and their typicality ecoree 
seen in Table 2. The eubjecte' original 
typicality valuee were reverse coded and 
then converted to proportions so that 
all components judged as bsing at the 
highest level of typicality equal 1 no 
matter how many levels of typicality an 
individual judge may have ussd. Theee 
scores were then averaged and the aver 
ages for the 15 components mentioned by 
both eets of abstractors were summed. 

Ae can be seen from comparing the order- 
ing of the 15 common componente based on 
typicality ratings in Table 2 with the 
ordering baeed on frequency of free gen- 
eration of componente in Table 1. having 
subjecte aeeign typicality scores to a 
prepared list of components changes the 
relative ordering to soms extsnt. This 
ie not euroneing. however, eince recall 
and recognition are known to be very 
different memory taeks and a componsnt 
which was simply not rscal lad by an 
individual abstractor in the free gener- 
ation task may later be recognized as 
quite typical of an empirical abetract. 

Table 3 presents a final ranked ordering 
of the 15 common components based on the 
combined results of Task 1. the frse- 
generation task. and Taek 2. the typi- 
cal i ty rating taek. Al though these 
taeks are adm 1 1 ted 1 y d i f f erent l n 
nature. the rankings in Table 3 present 



139 



a preliminary indication of the relative 
significance of these components in the 
mental framework of thio group of expert 
abstractors, 

From Task 3* the free-sorting task* only 
the results based on one type of tort* 
the groupsd-ordering sort are reported 
here. This was the most commonly used 
scheme for sorting (10 out of 12 sub- 
jects) and a 1 so a source o f assent i a 1 
information in constructing a predicta- 
ble frame structure. Sorting on this 
parameter provi dsd not only the higher 
level structuring of empirical abstracts 
but also information as to which compo- 
nents co-cccur uithin each o f these 
• meta-components' . 

For i 1 lustration. ths sort of ons sub- 
ject* who made and orally labeled five 
piles of cards is presented in Figure 1. 
Listed beneath each pile's label are the 
abstract components designated by the 
eubj set as bel ongi ng to that group. 

Using the grouped-ordering sorts of ths 
10 abstractors* matrices of ths frequen- 
cy with uhich each of the 15 common com- 
ponents wae placed in the same group as 
every other component were constructed 
for 1) ERIC* 2) PeycINFO and 3) a com- 
posite of both. The composite matrix le 
presented in Table 4. 

Figure 2 ie a graphic representation of 
ths 15 common components using ths 
matrix values in Tabls 4. This repre- 
sentation* which ie to be read clockwiss 
from the upper left-hand corner* ie 
intended to convey more clearly a notion 
of the baeic structure existing within 
such abstracts. Ths 1 inss encircling 
the three groupings are arbitrarily 
sketched* but can be eeen to enclose 
sets of componenue which exiet in very 
et rong and inter-connected aeeoc l at ions 
with each other. 



The results of Task 
abstractors to epec 
they see ae existing 
ponente* were quite 
not be pr eeen ted here 
Figure 3 does serve 
o f re 1 at i one offered 
adding to each link a 
of one semantic re 
abet rac tore. 



4 * wh l ch aeked 
l f y the re 1 at l one 
among abetract com- 
exteneive and will 
in their entirety, 
to euggest the type 
by abstractors by 
1 ex ica 1 exp reee l on 
lation offered by 



can be automatically detected* and a 
frame structure actual ly instantiated 
for each individual empirical ahstract 
processed. Ongoing research will show 
how the guidance offered by the expert- 
generated st rue tu re was used to dsvs 1 op 
lexical clue reconition rules and how 
theee rules* when appl led to a sample 
set of empirical abstracts? produce 
st rue tu red rep resen tat ions* 

Reeul ts of the next stage of the 
research uhich is currently rearing com- 
pletion will indicate whether rule- 
governed instantiation of ths abetract 
f rams structure can be accompl ished. 
Potiitive reeul ts would support ths fea- 
sibility of automatic procsasing of 
abstracts to fill the slo.e of an 
sbst rsct frame. Automat ie instantiat l on 
would produce a rsprsssntation contain- 
l ng not on 1 y the eubetant l ve content o f 
an abstract * e components but also indi- 
cating which frame component the infor- 
mation belongs to and how this informa- 
tion is related to other information in 
the abstract* Such reprsasntationa 

offer the potent la 1 for produc i ng 
retrieval results of greater precision. 



NOTES 

1. C. Borgman* D. ttoghdam & P. Corbstt* 
Effective Online Searching (New York: 
ttaroel Dekker* 1954). 

2. V. Propp * Morphology of the Folk-tale 
<L. Scott* Trane. ) • ( B 1 oom l ngton : 
Indiana Univereity Prase* 1358). 
(Original work publ lehed 1929) . 

3. R. Longac re * The Grammar of Discours e 
(New York: Plenum Press, 1983). 

4. T. A. van Di j k* Hac roet ructurssi An 
Interdiecipl inarv Study of Global 
Structures in Discourse, Interaction* 
and Cognition (Hilledale* NJ : Lau- 
rence Er 1 baum Assoc latss* 1980) • 



5. M. Mi neky » " A Framework for Rep re - 
eenting Knowledge. n In P. Vineton 
(Ed.). The Peychology of Computer 
Vision (New York: ttcGraw-H 1 1 1 * 1975)* 
1 1-77. 

6. D. tletzing (Ed. ) • Frame Concept l ons 
and Text Underetandrng, ( New York : 
Walter de Gruyter. 1980). 



I ERIC 



CONCLUSIONS 

The nature of an abstract's frame struc- 
ture uncovered in the results of ths 
four tasks reported above ie cur rent 1 y 
being ueed to guide the ssarch for rules 
govsrning ths ways this etructuie is 
revealed by lexical clues. In order to 
demonstrate that the frame structure of 
empirical abetracte can be ueeful in 
information retrieval taske* it ie 



7. G. Bower • J. Black & T. Turner. 
"Scnpte in Memory for Text, H Coqn i - 
tive Peychology, 11 (1979), 177-220. 



8. 



N. Cantor* M A Cognitive-Social 
Approach to Personal l ty. " In N. Can- 
tor I* J. Ki hi strom (Eds. ) » Pereonal i- 
ty, Coqn 1 1 ion* and Soc la I I nteract l on 



(Hi 1 ledale* 
Aeeoc iates» 



NJ: 
1 98 1 ) . 



Laurence 
23-44. 



Er 1 baum 



essential to ehew that thie structurs 9. A. Graeeser 8* S. Goodman* H Implicit 

140 



knowledge. Question Answering. and 
the Representation of Expository 
Text. H In B. Britton & J. Black. 
(Eds.). Understand 1 ng Expos 1 torv 
Texts; A Theoretical and Practical 
Handbook for Analyzing Explanatory 
Text (Hillsdale. N J : Laurence Erlbaura 
Aosociates. 1985). 109-171. 



10. D. Canter. J. Brown & L. Groat. H A 
Multiple Sorting Procedure for Stud- 
ying Conceptual Systems." In n. 
Brenner. J . Broun & D. Canter 
(Eds.). The Research Interview: Uses 
and Approaches (London: Academic 
Press. 1985) . 79-1 14. 



ERIC 



141 5 



Table l x Frequency of Component Generetion 



COMPONENT 


ERIC 


Pave I WF0 


Tota 1 




( N» 1 0 ) 


( N=*£ ) 


( n»i 4 ) 


GENERATED BY BOTH SERVICES 








hypotheeie 


10 


3 


13 


OUvJ WW 


Q 


4 


1 3 


hw liiuuu * uy y 


ft 


3 




4* 4 4 *\nu 

l x nu x iiyv 


7 
* 


3 




r esu 1 t e 


3 


2 


10 


\* vi i ^uss 


4 


4 


ft 


w one a uBt uiiB 


4 


3 


7 




4 


3 


7 




c 
-> 


2 


7 


4 a<~>iimm'{ Yy 


■J 


2 






O 


2 


*♦ 


c ond i t i ona / t raa t. een t •» 


I 


2 


3 


mmmti 1 M mm 1 mp 4* 4 w 4 > A«»«Viv%4/"ri i 

■■■^iB ■■iiciton iBcnniijufi 


1 
1 


2 


J 


4 »% 4* anrlaH i imm / nrBPt 4 mm 1 

iniinuiu ui*/ >/ » >w w 1 ce * 




} 


-3 


• nnl 1 nnt 4 Ann 










1 

1 




2 


PD If* nui Y 








f 1 1 lira riflMrrh n nnrl ai 


7 




7 


Hat a anal vaia 


4 




4 


inetitution doing etudy 


4 




4 


location of etudy 


4 




4 


t * ma f raaa of at idiv 


4 




4 


r-> nnanrl 4 nan 4 (If 1 1 * ri 

• ppBTlOlCnB 1 III; 1 Uueu 


-3 
j 




-5 


UBpvnuvnb wi nut* 






3 


4 nHsnanHpn 4* variflhla 

A I IU Wf» 1 lUOl I W V • 1 4 WW X X3 


-3 




3 


arimntatratnra of at i irlt; 

(■UNI • 1 1 * B W 1 • bUl P w A DC UwA y 


O 
«- 




2 


background 


2 




2 


confounding variables 


2 




2 


intended audience 


2 




2 


tables included 


2 




2 


data col lection 








1 laitatione 








new terms defined 








reliability of findings 








subsequent research planned 








unique features of study 








PsycINFO ONLY 








tests 




4 


4 


drugs administered 




3 


3 


procedures 




3 


3 


apparatus 




2 


2 


significance of findings 




2 


2 



control populati on 
materials 

number of experiments 
research queetion 
ecope 



ERIC 



142 




Table 2 t Rankings Baeed on Averaged Typicality Scores 



COMPONENT 

COHHON TO BOTH SERVICES 
methodology 
findings 
results 
purpose 
hypothesis 
subjects 
conclusions 
rssssrch dssign 
references 

ssspls selection technique 
discussion 

intended use/practical 

sppl i est ions 
ispl icstions 

rslation to other research 
cond i t i ons/ 1 reatnen ts 

ERIC ONLY 

dsts collsction 

uniqus features of study 

data analysis 

time frsss of study 

background 

dependent variable 

tsblss includsd 

indspsndsnt vsrisbls 

sppsndicss includsd 

intsndsd sudisncs 

futurs rssciarch nssds 

institution doing study 

1 isi tstions 

locstion of study 

confounding vsrisbls? 

rslisbility of findings 

subssqusnt rsssarch plsnnsd 

adsinistratora of study 

nsu tsrns dsfinsd 

PsyclNFO ONLY 

control population 
drugs sdainistsrsd 
nusbsr of sxpsriaents 
rssssrch qusstion 
tssts 

procsdurss 

significsncs of findings 

sppsrstus 

scops 

sstsrisls 



ERIC PeycINFO TOTAL 

1 1 2 

• 975 1 1 . 975 

. 950 1 1 . 950 

. 944 1 1 . 9*4 

. 938 1 1 • 938 

.925 1 1 . 925 

.975 .938 1.913 

.901 .938 1.839 

. 576 1 1 . 576 

.598 -915 1.513 

.791 .56 1.351 

.739 .56 1.299 

.72 .56 1.28 

.589 .642 1.231 

.498 .688 1.186 

.851 .851 

.788 .788 

.77 .77 

. 765 . 765 

.76 .76 

. 749 . 749 

.701 .701 

. 696 . 696 

.67 .67 

. 639 • 639 

. 625 • 625 

.622 .622 

. 599 . 599 

. 592 . 592 

. 549 . 549 

. 499 . 499 

.49 .49 

. 485 . 485 

. 448 . 448 



.915 .915 

.83 .83 

. 705 . 705 

. 645 . 645 

.496 .498 



7 

143 



Table 3 s Ranking Based on Tasks 1 & 2 



COMPONENT 


TASK 1 


TASK 2 


SUM OF 


FINAL 




RANK 


RANK 


RANKS 


RANK 


sethodol ooy 


3 


1 


4 


1 


findings 


4.5 


2 


6.5 


2.5 


hypotheeiB 


1.5 


5 


6.5 


2.5 


results 


4.5 


3 


7.5 


4.5 


mjbj ects 


1.5 


5 


7.5 


4.5 


purpose 


6 


4 


10 


6 


conclusions 


8 


7 


15 


7 


references 


11 


9 


20 


8 


disc use ion 


10 


11 


21 


9.5 


ispl ications 


8 


13 


21 


9.5 


relation to other research 


8 


14 


22 


1 1 


research design 


15 


8 


21 


12.5 


easple selection technique 


13 


10 


23 


12.5 


intended use/practical 


13 


12 


25 


14 


appl ications 










condx t ions/ 1 reat sen te 


13 


15 


28 


15 



Subject 4 - PeycINFO 



RESEARCH QUESTION 



SUBJECT POPULATION 



METHODOLOGY 



research question 
hypo theei e 
scope 
purpose 



no. of experiments 
sample eelection 
eubjecte 

control populat ion 



methodo 1 ogy 
apparatus 
procedures 
mater ia 1 s 
research deeign 
cond l t l one 
teats 

drugs adm l n l ate red 



FINDINGS 



RESULTS APPLIED 



resul te 
f ind ings 
sign i f icance 
cone 1 usions 
diecuesion 



practical appl ications 
impi ications 
relation to research 



Figure 1 : Exampl e of Ons Grouped-0 rder ing Sort 



9 

ERIC 



8 



144 



Table 4; Co-occurence of Components in Same Group 



2 3 A 5 6 7 8 9 10 11 12 13 14 15 



1. methodology 

2. findings 

3. hypothesis 

4. results 

5. subjects 

6. purpose 

7. conclusions 

8. references 

9. discussion 

10. impi icat ions 

11. relation to research 

12. research design 

13. sample selection 

14. intended use 

15. conditions 




ERIC 



145 



